49 research outputs found

    DisPredict: A Predictor of Disordered Protein Using Optimized RBF Kernel

    Get PDF
    Intrinsically disordered proteins or, regions perform important biological functions through their dynamic conformations during binding. Thus accurate identification of these disordered regions have significant implications in proper annotation of function, induced fold prediction and drug design to combat critical diseases. We introduce DisPredict, a disorder predictor that employs a single support vector machine with RBF kernel and novel features for reliable characterization of protein structure. DisPredict yields effective performance. In addition to 10-fold cross validation, training and testing of DisPredict was conducted with independent test datasets. The results were consistent with both the training and test error minimal. The use of multiple data sources, makes the predictor generic. The datasets used in developing the model include disordered regions of various length which are categorized as short and long having different compositions, different types of disorder, ranging from fully to partially disordered regions as well as completely ordered regions. Through comparison with other state of the art approaches and case studies, DisPredict is found to be a useful tool with competitive performance. DisPredict is available at https://github.com/tamjidul/DisPredict_v1.0

    A Recommender System for Adaptive Examination Preparation using Pearson Correlation Collaborative Filtering

    Get PDF
    Distance learning is any type of far-off instruction where the understudy isn't actually present for the exercise. It is blasting gratitude to the force of the Internet. Distance learning plays a vital role for examination preparation where multiple choice questions can be utilized to evaluate the performance of students. Multiple Choice Question (MCQ) is a type of question used in the examination to evaluate the performance of students accordingly where usually four options are given along with the question, and one has to choose the correct answer. This research includes a simulation model that has been built to keep the learners continue to learn the subjects they might be weak in. We have developed a methodology that may guide a student to update his/her area of weakness by using a recommender system based on Pearson Correlation Collaborative Filtering approach. The paper describes a recommender system that will keep track of a learner's profile and create an adaptive training mechanism using the performance matrix

    High Occurrence of Zoonotic Subtypes of Cryptosporidiumparvum in Cypriot Dairy Farms

    Get PDF
    Cryptosporidium parvum is one of the major causes of neonatal calf diarrhoea resulting in reduced farm productivity and compromised animal welfare worldwide. Livestock act as a major reservoir of this parasite, which can be transmitted to humans directly and/or indirectly, posing a public health risk. Research reports on the prevalence of Cryptosporidium in ruminants from east Mediterranean countries, including Cyprus, are limited. This study is the first to explore the occurrence of Cryptosporidium spp. in cattle up to 24 months old on the island of Cyprus. A total of 242 faecal samples were collected from 10 dairy cattle farms in Cyprus, all of which were screened for Cryptosporidium spp. using nested-PCR amplification targeting the small subunit of the ribosomal RNA (18S rRNA) gene. The 60 kDa glycoprotein (gp60) gene was also sequenced for the samples identified as Cryptosporidium parvum-positive to determine the subtypes present. The occurrence of Cryptosporidium was 43.8% (106/242) with at least one positive isolate in each farm sampled. Cryptosporidium bovis, Cryptosporidium ryanae and C. parvum were the only species identified, while the prevalence per farm ranged from 20–64%. Amongst these, the latter was the predominant species, representing 51.8% of all positive samples, followed by C. bovis (21.7%) and C. ryanae (31.1%). Five C. parvum subtypes were identified, four of which are zoonotic—IIaA14G1R1, IIaA15G1R1, IIaA15G2R1 and IIaA18G2R1. IIaA14G1R1 was the most abundant, representing 48.2% of all C. parvum positive samples, and was also the most widespread. This is the first report of zoonotic subtypes of C. parvum circulating in Cyprus. These results highlight the need for further research into the parasite focusing on its diversity, prevalence, host range and transmission dynamics on the islan

    Critical assessment of protein intrinsic disorder prediction

    Get PDF
    Abstract: Intrinsically disordered proteins, defying the traditional protein structure–function paradigm, are a challenge to study experimentally. Because a large part of our knowledge rests on computational predictions, it is crucial that their accuracy is high. The Critical Assessment of protein Intrinsic Disorder prediction (CAID) experiment was established as a community-based blind test to determine the state of the art in prediction of intrinsically disordered regions and the subset of residues involved in binding. A total of 43 methods were evaluated on a dataset of 646 proteins from DisProt. The best methods use deep learning techniques and notably outperform physicochemical methods. The top disorder predictor has Fmax = 0.483 on the full dataset and Fmax = 0.792 following filtering out of bona fide structured regions. Disordered binding regions remain hard to predict, with Fmax = 0.231. Interestingly, computing times among methods can vary by up to four orders of magnitude

    Estimation of Position Specific Energy as a Feature of Protein Residues from Sequence Alone for Structural Classification.

    No full text
    A set of features computed from the primary amino acid sequence of proteins, is crucial in the process of inducing a machine learning model that is capable of accurately predicting three-dimensional protein structures. Solutions for existing protein structure prediction problems are in need of features that can capture the complexity of molecular level interactions. With a view to this, we propose a novel approach to estimate position specific estimated energy (PSEE) of a residue using contact energy and predicted relative solvent accessibility (RSA). Furthermore, we demonstrate PSEE can be reasonably estimated based on sequence information alone. PSEE is useful in identifying the structured as well as unstructured or, intrinsically disordered region of a protein by computing favorable and unfavorable energy respectively, characterized by appropriate threshold. The most intriguing finding, verified empirically, is the indication that the PSEE feature can effectively classify disorder versus ordered residues and can segregate different secondary structure type residues by computing the constituent energies. PSEE values for each amino acid strongly correlate with the hydrophobicity value of the corresponding amino acid. Further, PSEE can be used to detect the existence of critical binding regions that essentially undergo disorder-to-order transitions to perform crucial biological functions. Towards an application of disorder prediction using the PSEE feature, we have rigorously tested and found that a support vector machine model informed by a set of features including PSEE consistently outperforms a model with an identical set of features with PSEE removed. In addition, the new disorder predictor, DisPredict2, shows competitive performance in predicting protein disorder when compared with six existing disordered protein predictors

    ROC curves given by DisPredict for the probability prediction per residue while the training is performed with (A) SL477 and (B) MxD444 dataset.

    No full text
    <p>In each figure, the solid (<i>blue</i>) curve corresponds to the cross validation test on the same dataset and the dotted (<i>red</i>) curve corresponds to the independent test. The AUC values given in each figure correspond to the values in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0141551#pone.0141551.t006" target="_blank">Table 6</a>. The x-axis and y-axis show the Specificity and Sensitivity, respectively.</p

    Density distribution curves of monograms and bigrams for (A) SL477 and (B) MxD444 dataset.

    No full text
    <p>The x-axis and y-axis show the monograms/bigrams in logarithmic scale and density index of the distribution, respectively. For each figure, the dotted (<i>red</i>) and solid (<i>blue</i>) vertical lines correspond to median values of the distribution for monograms (MG) and bigrams (BG), respectively.</p

    Comparative predictive quality of DisPredict with MFDp on MxD444 dataset and SPINE-D on SL477 dataset.

    No full text
    <p><sup>1</sup> 5-fold cross validation performance of MFDp on MxD dataset of 514 protein chains [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0141551#pone.0141551.ref056" target="_blank">56</a>].</p><p><sup>2</sup> 10-fold cross validation performance of DisPredict on MxD444 which is a subset of 444 chains out of 514 chains with no X-tag.</p><p><sup>3</sup> 10-fold cross validation performance of DisPredict on SL477.</p><p><sup>4</sup> 10-fold cross validation performance of SPINE-D [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0141551#pone.0141551.ref047" target="_blank">47</a>] on SL477.</p><p>Comparative predictive quality of DisPredict with MFDp on MxD444 dataset and SPINE-D on SL477 dataset.</p

    Name and definition of performance measuring parameters.

    No full text
    <p>Name and definition of performance measuring parameters.</p
    corecore